Robust unification for linguistics

نویسنده

  • Frederik Fouvry
چکیده

Unificationand constraint-based unification formalisms have been used widely to write natural-language grammars. There have nevertheless always been problems with the coverage of the grammars, due to too restrictive an interpretation of what the grammar described. Several approaches have been proposed, relaxing different requirements. Many of these however are ad hoc and rely on properties of a specific parser or architecture. In this paper, I present an approach that is based on a more abstract and general level, viz. the formalism. More concretely, I use a typed attribute logic, based on (Carpenter, 1992). In a first step it is ensured that the type hierarchy has certain properties to make a meaningful robustness possible (i.e. containing information about what went wrong). The size of such a hierarchy is limited through modulation. Then, a distance measure based on the logic is introduced to distinguish between different analyses. Reporting problems and treatment of unknown words fall out naturally from the formalism, and existing grammars and linguistic theories can be implemented or re-used without any specific adaptations for robustness like in other, non-robust, formalisms. 1. Linguistic theories and implementations The formalisation of linguistic theories, like Lexical-Functional Grammar (LFG) (Dalrymple et al., 1995) and Head-Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1987; Pollard & Sag, 1994), by means of mathematical models has greatly improved the possibility to test them by using implementations. This has not only confirmed certain predictions on the linguistic side, but it has also shown that there is a general deficiency in dealing with unexpected, extra-grammatical input. In a constraint-based grammatical theory like HPSG, this becomes visible in parse failures, especially when the logical setup of the formalism is consistently pursued. For this reason, computational linguists have worked with weaker assumptions to ensure that at least some result is obtained. One of these assumptions is that the implementations are mostly unification-based: there are no satisfying computational accounts yet for constraint-based formalisms of that complexity. Some implementations of versions of these formalisms are the Attribute Logic Engine (ALE) (Carpenter & Penn, 1998), the Linguistic Knowledge Base (LKB) Frederik Fouvry > a b ? Figure 1: A simple type hierarchy (Copestake, 1999), and ConTroll (Götz et al., 1997). Another, nearly ubiquitous, technique is to use parsers that store intermediate analysis results (e.g. chart parsers). After the parse, these results are often put through a recovery module, as in (Jensen et al., 1983). A problem with the partial results is that the part of the analysis where the problem was detected is not present in the final analysis (or analyses): applications that need that information, say grammar checkers, have to reproduce it. Statistical methods, a common approach to solve the problem of undergeneration (through relaxation of the rules) and overgeneration (through ranking of the results), cannot, as far as I am aware, handle arbitrary feature structures yet. In what follows, I present a method to re-interpret the typed formal framework of linguistic theories such that robustness and information about mistakes comes with the formalism, and does not require the development of specialised modules to deal with robustness. 2. Formal preliminaries Following (Carpenter, 1992), an inheritance hierarchy or type hierarchy consists of a set of types and a relation: hType;vi. The relation v, called the subsumption relation, defines a partial ordering on the types, expressing that some are more general than others. Usually there is an explicit most general type (?) and a most specific type (>). If they are not present, it is easy to add them if necessary, because of their unique position: > for instance is more specific than all other elements in the hierarchy. > can be interpreted as fail , and ? as true . Given the above, it is obvious that ? v >. The unification of two types t and s , t t s , is defined as the most general type that is more specific than each of the two types being unified. It is important that the unification gives a unique result; many properties of the inheritance hierarchy disappear otherwise. For instance (see Figure 1), if we have a type a and the two aforementioned types > and ?, ? t a = a, and > t a = >. (This also shows that ? is the neutral element under t, and > the null element.) If we add a fourth type b, with ? v b v >, then a t b = >. The relation between a and b is undefined: a and b are incomparable. Carpenter furthermore requires that the type hierarchies should satisfy the condition that every two types that have common subtypes, have a unique most general common subtype (the bounded complete partial order (BCPO) condition); see Figure 2). The benefit of this is that the unification operation will only return one result. These hierarchies are also known as finite meet semi-lattices, which I define now. Definition 1 (Lattice) A non-empty partially ordered set hL;vi is a lattice if for all a; b 2 L : a t b is defined; for all a; b 2 L : a u b is defined. Robust unification for linguistics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binding Machines

Binding constraints form one of the most robust modules of grammatical knowledge. Despite their crosslinguistic generality and practical relevance for anaphor resolution, they have resisted full integration into grammar processing. The ultimate reason for this is to be found in the original exhaustive coindexation rationale for their specification and verification. As an alternative, we propose...

متن کامل

THE SIXTH JAPANESE-KOREAN JOINT CONFERENCE ON FORMAL LINGUISTICS ,1991 Unification in Unification-based Grammar

In the computational linguistics area, after Kay's adopting of feature structures and their unification to manipulate syntactic structures[KAY79], a lot of unification-based grammar formalisms have been issued such as FUG[KAY82] , LFG[BRE82] , GPSG[GAZ85] , PATRII[SHI86], HPSG[P0L87], etc.. Nowadays, an unification of feature structures is known as one of the most effective and powerful means t...

متن کامل

Pacific Association for Computational Linguistics EXPRESSING PROBABILISTIC CONTEXT-FREE GRAMMARS IN THE RELAXED UNIFICATION FORMALISM

The Theory of Relaxed Unification is a newly proposed theory that extends the power of classical unification. The theory relaxes the rigid constraints of requiring a perfect match between the terms being unified to allow multi-valued attributes. The Relaxed Unification Inference System is an implementation of an inference engine and an interpreter, which uses the relaxed unification mechanism i...

متن کامل

Assessing Complexity Results in Feature Theories

In this paper, we assess the complexity results of formalisms that describe the feature theories used in computational linguistics. We show that from these complexity results no immediate conclusions can be drawn about the complexity of the recognition problem of unification grammars using these feature theories. On the one hand, the complexity of feature theories does not provide an upper boun...

متن کامل

Unification Encodings of Grammatical Notations

This paper describes various techniques for enriching unification-based grammatical formalisms with notational devices that are compiled into categories and rules of a standard unification grammar. This enables grammarians to avail themselves of apparently richer notations that allow for the succinct and relatively elegant expression of grammatical facts, while still allowing for efficient proc...

متن کامل

On the Relation Between Context and Sequence Unification

Both Sequence and Context Unification generalize the same problem: Word Unification. Besides that, Sequence Unification solves equations between unranked terms involving sequence variables, and seems to be appealing for information extraction in XML documents, program transformation, knowledge representation, and rule-based programming. It is decidable. Context Unification deals with the same p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000